/*******************************************************************************
File    : treatment.do
Project : Bank Expansion and Moneylender Interest Rates - RDD Evidence from India
Purpose : Create RBI underbanked treatment indicator by matching RBI circular
          underbanked district list to Census 2001 districts (reclink).
Author  : Kannan Narayanaswamy
Updated : 09 Feb 2026
Inputs  : Data\MOF_Data\Census 2001\Census_2001.dta
          Data\MOF_Data\Census 2001\Underbanked Districts.xlsx
Output  : Data\MOF_Data\Census 2001\Census_UB.dta
Notes   : reclink matches master (Census) to using (RBI list). We also verify
          that every using observation is matched (one-to-one here).
*******************************************************************************/
*==============================================================*
* 0. Setup: Paths and base Census file                         *
*==============================================================*
cd "`c(pwd)'\Data\MOF_Data\Census 2001"

use "Census_2001.dta", clear
tempfile census
save `census'

*==============================================================*
* 1. Import RBI underbanked list and basic name cleaning       *
*==============================================================*
import excel "Underbanked Districts.xlsx", ///
    sheet("Underbanked Districts") firstrow clear

* Standardize capitalization to improve matching
replace State_UT = strproper(State_UT)
replace District = strproper(District)

* Correct obvious spelling/name issues in the RBI list
replace District = "Nuapada"  if District == "Nawapara"
replace District = "Dohad"    if District == "Dahod"
replace District = "Palamu"   if District == "Palamau"

*==============================================================*
* 2. Map carved-out (child) districts back to 2001 parents     *
* (for districts created b/w 2001 and the 2005 in the circular)*
*==============================================================*
gen Parent_District = District

* Both parent and child in the list
replace Parent_District = "Lohit"              if District == "Anjaw"
replace Parent_District = "East Nimar"         if District == "Burhanpur"
replace Parent_District = "Lower Subansiri"    if District == "Kurung Kumey"
replace Parent_District = "Dibang Valley"      if District == "Lower Dibang Vally"
replace Parent_District = "Jehanabad"          if District == "Arwal"
replace Parent_District = "Palamu"             if District == "Latehar"
replace Parent_District = "Dumka"              if District == "Jamtara"
replace Parent_District = "Paschimi Singhbhum" if District == "Saraikalan"
replace Parent_District = "Shahdol"            if District == "Anuppur"
replace Parent_District = "Guna"               if District == "Ashoknagar"
replace Parent_District = "Tuensang"           if inlist(District,"Khirpe","Longleng")
replace Parent_District = "Kohima"             if District == "Peren"
replace Parent_District = "Dharmapuri"         if District == "Krishnagiri"

* One of the children in the list
replace Parent_District = "Kamrup"             if District == "Kamrup Metropolitan"
replace Parent_District = "Medinpur_old"       if District == "Paschim Medinipur"

drop District
duplicates drop State_UT Parent_District, force

rename Parent_District District
gen ID_UB = _n   // unique ID for underbanked list

tempfile underbanked_list
save `underbanked_list'

*==============================================================*
* 3. Fuzzy match RBI list to Census 2001 districts (reclink)   *
*==============================================================*
use `census', clear

* reclink: master = Census, using = underbanked list
reclink State_UT District using `underbanked_list', ///
    idmaster(ID_Census) idusing(ID_UB) gen(match_score)

* Two districts have been forced as matched (manual fix)
replace UDistrict = "" if District == "Kanpur Nagar"
replace _merge    = 1  if District == "Kanpur Nagar"

replace UDistrict = "" if District == "Purbi Singhbhum"
replace _merge    = 1  if District == "Purbi Singhbhum"

*--------------------------------------------------------------*
* 3a. Check that no using observations are unintentionally left
*     unmatched by reclink (reclink matches on master only)    *
*--------------------------------------------------------------*
preserve
    drop if _merge == 1          // keep only non-matched / partial matches
    drop _merge

    * Merge back to the underbanked list via ID_UB to see which
    * underbanked districts (using-file) did not get a match
    merge 1:1 ID_UB using `underbanked_list', keepusing(State_UT District)

    * Inspect remaining underbanked districts with no match
    tab District if _merge == 2		// This was 0
	duplicates report ID_UB			// This was also 0.
	*This means that the reclink merge was one-to-one mapping
	*Exactly 583 districts to 583 districts
restore

*==============================================================*
* 4. Construct treatment indicator and clean workspace         *
*==============================================================*
gen Underbanked = 1 if _merge == 3     // matched Census–RBI: treated
replace Underbanked = 0 if _merge == 1 // unmatched Census: not underbanked

drop UState_UT UDistrict no_villages_inhabited no_villages_uninhabited ///
     Numberoftowns Numberofhouseholds pop_male pop_female ///
     match_score ID_UB _merge

save Census_UB, replace

cd ..\..\..